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Abstract 

We consider the file maintenance problem (also called the online labeling problem) in which 
n integer items from the set {1, . . . , r} are to be stored in an array of size m > n. The items 
are presented sequentially in an arbitrary order, and must be stored in the array in sorted order 

Q(but not necessarily in consecutive locations in the array). Each new item must be stored in the 
array before the next item is received. If r < m then we can simply store item j in location j 
but if r > to then we may have to shift the location of stored items to make space for a newly 
(N arrived item. The algorithm is charged each time an item is stored in the array, or moved to a 

new location. The goal is to minimize the total number of such moves the algorithm has to do. 
{jy This problem is non-trivial when n < to < r. 

£^ In the case that to = Cn for some C > 1, algorithms for this problem with cost 0(log(n) 2 ) per 

item have been given |IKR81| IWil92l lBCD+02j . When to = n, algorithms with cost 0(log(n) 3 ) 
per item were given |Zha931 IBS07] . In this paper we prove lower bounds that show that these 
algorithms are optimal, up to constant factors. Previously, the only lower bound known for 
I this range of parameters was a lower bound of il(log(n) 2 ) for the restricted class of smooth 

algorithms |DSZ05al IZha93] . 

^sO We also provide an algorithm for the sparse case: If the number of items is polylogarithmic 

in the array size then the problem can be solved in amortized constant time per item. 

in 

1 Introduction 

The file maintenance problem 

t> In this paper we consider the file maintenance problem in which n integer items from the set 

{1, . . . ,r} are to be stored in an array of size m > n. The items are presented sequentially in an 
arbitrary order, and must be stored in the array in sorted order (but not necessarily in consecutive 
locations). Each new item must be stored in the array before the next item is received. If r < m 
then we can simply store item j in location j but if r > m then we may have to shift the location 
of stored items to make space for a newly arrived item. The algorithm is charged each time an 
item is stored in the array, or moved to a new location. The goal is to minimize the total number 
of such moves the algorithm has to do. This problem is non-trivial when n < m < r. 
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An alternate formulation is the online labeling problem in which arriving items must be assigned 
an integer label in the range [1, m] so that the order on the labels agrees with the numerical ordering 
on the items. The algorithm pays one each time an item is labeled or relabeled. Typically in the 
literature the file maintenance problem refers to the small space regime in which m = O(n). This 
case is the focus of this paper. 

Itai et al. |IKR81j were the first to design an algorithm that maintains an array of size m = O(n) 
sorted while making only 0(n log(n) 2 ) moves in total, i.e., in amortized setting the algorithm makes 
0(log(n) 2 ) moves per item. Willard [Wil92| improved this algorithm to the worst case setting of 
0(log(ra) 2 ) moves per item and Bender et al. jBC D + 02] further simplified his result. The Itai 
et al. approach can be modified so that for an array of size m = n 1+€ , e > constant, it uses 
only 0(log(n)) moves per item, amortized (folklore). For the case that the array size m is exactly 
the number of items n, [Zha93] gave an algorithm that achieves a surprising amortized bound 
0(log(n) 3 ) moves per item; this result was simplified in [BS07J. 

In recent years there has been renewed interest in this problem due to its applications in the 
design of cache-oblivious algorithms, e.g., design of cache-oblivious B-trees [BDFC05, B FJ02] and 
cache-oblivious dynamic dictionaries [BDIW04] . However, until now it was not known whether the 
maintenance algorithms for the small space regime can be improved to achieve better amortized 
cost. 



Our results 

In this paper we prove an f2(nlog(n) 2 ) lower bound on the number of moves for inserting n items 
into array of size m = 0{n) for any online labeling algorithm, matching the known upper bound up 
to constant factors. For the case of array size m = n + n 1_<E , we prove the asymptotically optimal 
lower bound il(n log(n) 3 ). 

Our lower bounds are valid even for relatively small r; it is enough that r is bounded below by 
a sufficiently large constant times m. (Recall that the problem has a trivial solution of cost n if 
r < to.) 

Our lower bounds apply to slightly superlinear array size. For example, if to = 0(n log(n) 1_e ) 
one can prove an amortized lower bound 0(log(n) 1+e / 3 ) (though here we need a large range size to 
get this lower bound.) 

Our bound is the first lower bound for general algorithms in the small space regime. Previously 
Dietz et al. [DSZ05a, Zha93j proved an amortized 0(log(n) 2 ) lower bound for the restricted case 
of so-called smooth algorithms. 

In addition to the lower bounds, we provide a new upper bound in the case than to is a large 
function of n. We give an algorithm that provided that to is at least 2 log ( n ) for k > 3 has amortized 
cost 0(log(n)/loglog(m)). In particular, for any fixed c, log(TO,) c items can be inserted into an array 
of size to in constant amortized time. 



Previous lower bounds 

There are two previous papers, both by Dietz, Seiferas and Zhang, that give lower bounds for this 
problem. The first (|DSZ05a], also available in Zhang's Ph.D. thesis [Zha93]) considers the small 
space regime, and proved an $7(log(n) 2 ) amortized lower bound for a restricted class of algorithms, 
called smooth algorithms, which are limited to redistributing items in a uniform fashion. While 
this lower bound is interesting and non-trivial (and introduces several key ideas that we use in 
our lower bound), the restriction to smooth algorithms is significant. The lower bound for smooth 
algorithms is obtained by considering a very simple adversary which exploits the smoothness of the 
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algorithm; a non-smooth algorithm can easily handle the given adversary with constant amortized 
time per item as the adversary inserts all the items in decreasing order. There is some confusion 
in the literature about this result, the fact that it applies only to a restricted class of algorithms 
is sometimes not mentioned (e.g., |BS07| ). creating the impression that the general lower bound 
result was already established. 

The other lower bound for this problem is the amortized 0(log(n)) lower bound of Dietz et 
al. [DSZ05b] for arrays of size m = n 0<yl \ His lower bound applies to the "intermediate space" 
regime (polynomial in the number of items), which is not dealt with in this paper. We have some 
concerns about the correctness of this paper which we have recently raised with one of the authors 
(Seiferas), who agrees that there is an error that does not seem to be readily fixable. (Their result 
consists of two parts: a lower bound for a problem they call bucketing which we believe is correct, 
and a reduction which converts the lower bound for bucketing to a lower bound for online labeling, 
which we have doubts about.) 

Whether or not this reduction turns out to be valid, this paper, like the other paper mentioned 
above, lays out the basic approach and provides important ideas which we make use of in our paper. 

Proof technique 

The general idea for our lower bound (which builds heavily on the above-mentioned work of Dietz 
et al.) is to build an adversary that will force the maintenance algorithm to make many moves 
of items that are already stored in the array. The adversary will attempt to identify a densely 
populated {crowded) segment of the array and load an item that is in the middle of the items 
already stored there. Repeated insertions of items with value in this range will eventually force the 
algorithm to move the existing items. 

Deriving a lower bound based on this idea has various complications. The natural notion 
of crowding of a segment is the ratio of stored items to the size of the segment. Whether a 
particular portion of the array is considered to be crowded may depend on the scale of segments 
being considered; there may be a relatively small segment that is very crowded, but larger segments 
containing it are uncrowded. To force the algorithm to work hard, we want to identify a region that 
is crowded at many different scales. This suggests identifying a long nested sequence of segments 
covering a wide range of scales, such that each is crowded. The hope is that loading many items 
having value in the middle of the range of items stored in the smallest nested segment will force 
the algorithm to do costly rearrangements at all of these different scales. 

A straightforward way to accomplish this is to start with the entire array, and successively select 
a nested subsegment having highest density among subsegments of, say, half the size of the current 
segment. This results in a sequence of segments of increasing density, but does not seem to be 
enough to give a good lower bound. The problem arises when successive selected subsegments are 
chosen near the boundary of the parent segment. In this case, the algorithm may be able to relieve 
overcrowding by relatively inexpensive rearrangements that cross the boundary of many segments 
in the sequence into uncrowded segments. To avoid this, the adversary would like to select each 
subsegment in the sequence so that it has a significant buffer to its left and right in the parent 
segment, where each buffer contains a constant fraction of the items in the parent segment. The 
presence of such buffers can be used to ensure that as a segment gets crowded, all of the items 
in either its left or right buffer will have to be moved. The difficulty is that when we insist on 
having these buffers we can no longer ensure that the density of the segments in the sequence do 
not decrease (because a given segment in the sequence may have its items concentrated near its 
boundary). So we have to allow some decrease in the segment density along the sequence. 

Dietz et al. [DSZ05bJ manage to construct such a nested sequence in which each successive 
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segment has a large left and right buffer. The problem is that the density of segments down the 
sequence may decrease by as much as a constant factor, so that if the sequence has logarithmic 
length the density decreases by a fraction nP^. This limits the quality of lower bounds that can 
be proved. 

The goal then is to define this nested sequence in such a way that we still have large buffers, but 
the density degrades at a much slower rate. Our approach begins with the observation that if for 
a given segment every subsegment having large buffers has density significantly smaller than the 
given segment, then there must be a large subsegment (near the boundary of the given segment) 
that does not have large buffers but does has substantially higher density than the given segment. 
This allows us to build a chain of 0(log(n)) segments, such that most of the segments have large 
buffers with respect to their parents, and the degradation of density along the entire chain can be 
bounded by a constant factor. (To give a rough idea of the choice of parameters, when m = @(n), 
we allow decrease in density by a factor of at most (1 — @(l/log(n))) or increase by a factor of at 
least (1 + 0(l/log(n))) in a single step.) 

After identifying such a sequence of nested segments the adversary inserts new items into the 
inner most segment. Whenever the maintenance algorithm rearranges some portion of the array 
the adversary rebuilds the affected portion of the segment chain. An accounting similar to that of 
Dietz at el. |DSZ05a, Zha93] can then be applied on the segments having large buffers, to obtain 
the lower bound S7(log(n) 2 ). 

In our actual implementation of this idea we don't explicitly deal with the high density un- 
buffered segments. Rather we construct the sequence of segments in such a way that each segment 
in the sequence satisfies a strong uniformity property: No subsegment of the given segment of 
size at least 1/4 of the given segment has density significantly smaller than the given segment. In 
searching for the successor segment of a given segment S with this uniformity property, we first 
restrict to the middle third T of the segment, which we are guaranteed has density close to that 
of S. We then look for a large subsegment of T having density close to that of T and having the 
desired uniformity property. The restriction to T means that when we choose the next segment 
it is guaranteed to have large buffers. To identify the desired subsegment of T we maximize a 
certain potential function defined on the subsegments of T that gives a large subsegment D of 
T that almost has the needed uniformity properties: we get the needed properties by taking the 
middle third of D, and this is the next segment in the sequence. Maximizing the potential function 
implicitly captures the process of successively choosing subsegments of significantly higher density 
until one arrives at a subsegment for which no such selection is possible. 

The r2(log(n) 3 ) lower bound for an array of size m = n is obtained by iterative application of 
the 0(log(ra) 2 ) lower bound for inserting always one half of the remaining items. This parallels the 
idea used in |Zha93] to obtain a matching upper bound. 

For the lower bounds, our adversary needs some room in the range of values to select new items 
that should be inserted into the array. Once there are two keys in the array that are consecutive 
elements in the range of values the adversary cannot choose another element to be stored in between 
them in the array. As he inserts more items into the same position in the array the available room 
shrinks. This limits his power. To mitigate this problem we assign a slightly smaller weight to newer 
items that are inserted. Since our adversary tries to select a sequence of dense nested segments it 
automatically avoids places crowded by newer items. This technique allows us to bound the range 
size. 
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2 The model and main result 



2.1 A two player game 

In this paper, interval notation is used for sets of consecutive integers, e.g., [a,b] is the set {k 6 
Z : a < k < b}. We consider an array with cells indexed by the set [1, m] in which we store a set 
Y of integer-valued keys. A storage function for Y is a map / : Y — > \\,m] that is strictly order 
preserving, i.e., for x,y G Y if x < y then /(#) < f(y). In particular / is one-to-one, so \Y\ < m. 
Cells that are in the image of the map / are said to be occupied and the others are said to be 
unoccupied. A configuration is a pair (Y, /) where Y is a set of keys and / is a storage function for 
Y. 

To formalize the array loading problem we define a game G n (m,r), where n,m,r are positive 
integer parameters, which is played by two players, the adversary and the algorithm. The game 
is played in a sequence of n time steps. At step t, the adversary selects a key y t from the set 
{l,...,r} — {y 1 , . . . , y*^ 1 }, and the algorithm responds with a storage function /* for the set 
Y* = {y 1 , . . . , j/*}. We say that key y* is loaded at step i. (Y*, /*) is called the configuration at step 
t. 

A key y is relocated at step t if / f t ^ 1 (y). In particular y* is relocated at step t. The set 
of relocated keys at step t is denoted i?eZ*. The cost up to step t is x* = X^=i |-ReZ l |. Clearly x* > £ 
for every t. The objective of the algorithm is to minimize x™ an d the objective of the adversary 
is to maximize \ n ■ We write x n { rn i r ) f° r the smallest cost that can be achieved by the algorithm 
against the best adversary. 

G n (m, r) is not well defined if n > m since there can be no storage function once the number of 
items exceeds the number of cells. Also, if m > r, there is a trivial algorithm that achieves optimal 
cost n by storing each key y G [r] in cell y. We therefore assume n < m < r. 



2.2 The main theorems 

In this section, we state our lower bound results for x n ( m i r )- We divide our results into two 
theorems, corresponding to the relative size of m and n. 

The first theorem applies whenever 2n < m, but it only gives interesting results provided that 
m is not too large (slightly superlinear function of n). Here we separately consider two cases. In 
the first case, the range of possible keys is exponential in n. In the second case the range of keys is 
limited to be a constant times m. Despite this strong limitation, the lower bound is only slightly 
worse. 

Theorem 1 There is a constant C\ so that the following holds. Let m, n be integers satisfying 
C\ < n and 2n < m. Let 5 = n/m. Then 

1. Ifr> n2 n ~ 1 then X n (m,r) > n(ln(n)) 2 Cl(ln f 1/(5))2 • 

2. Ifr> dm then X n (m,r) > n(ln(n)) 2 Ci{l ^ 1/S)) 2 ■ 

In both parts, if m = 0(n) then the lower bound obtained is f2(n(ln(n)) 2 ). The first bound 
gives a nontrivial result (larger than n) for m up to 0(nln(n) 2 /(lnln(n)) 2 ), while the second bound 
is nontrivial for m up to 0(nln(n)/lnln(n)). 

In the next result we consider the n < m < 2n: 

Theorem 2 There are constants Co,C2,C3 so that the following holds. Letm,n be integers satis- 
fying Co < n < m < 2n and let 5 = n/m. Assume r > (j^) C2 n. Then: 



5 



X n (m,r) > ^n(ln(n)) 2 ln(l/(l-<5)). (1) 

For to < n + n l ~ e this gives a tight lower bound of f2(n(ln(n)) 3 ). Observe that for this lower 
bound we only need the range of keys to be polynomial in m. (A more refined analysis can provide 
an asymptotically same lower bound with range size n + 0{n 1 ~ £ ) for this case.) 

2.3 Partially loaded arrays and the main lemma 

As the game has been defined, every cell is initially unoccupied. For the proofs of the main theorems, 
it will be convenient to consider a small variant of the game, in which the array is initially partially 
loaded. This version of the game is specified by the parameters n, to (but not r) and additionally 
takes a set Y° of keys, whose size is denoted by uq. The array is initially loaded with the subset 
Y° and the algorithm selects the initial storage function f° (at no cost). The game then proceeds 
as before, except that the adversary is restricted to loading keys in the range (min(y°), max(Y )). 
We denote the game by G n (m\Y°) and write x n (m|y°) f° r the minimum cost that can be achieved 
by the algorithm against the best adversary. We assume that m > uq + n, otherwise there is not 
enough room to load all of the keys. 

For a set Y of keys, we define mingap(Y) to be the minimum absolute difference between pairs 
of keys in Y. As we will see, the following lemma easily implies Theorems [T] and [2j 

Lemma 3 There are positive constants Cq^C^ so that the following holds. Let m,n,no be integers 
satisfying Co < n < uq and n + uq < m. Let So = no/m. Assume So G (ln(n)~ 2 , 1 — n~ 1//5 ). 
Let Y° be any set of no keys. Let /iq = mingap(Yo)- Assume no > 1 + 12/<5o- 

1. If M > 2™ then X n (m\Y) > n(ln(n)) 2 c ff . 

2. If to < 2\ then X n (m\Y) > n(ln(n)) 2 g^gggp ■ 

Remark. In the second part of the lemma, we assume only that to — 1 + 12/ Jo- If j^o is much 
larger than 1 + 12/5o we can do a simple "black box" modification of the adversary so that an 
additional property holds: At the conclusion of the game mingap(Y™) > L/Wfl + 12/ <5ol J • Here 
is how we modify the adversary. For each pair of keys in Y° that are adjacent (no intervening key 
in Y°) the adversary selects [T2/<5ol equally spaced keys and ignores all other keys. Only these 
selected keys will be loaded during the game, so effectively the adversary is working with a mingap 
of [1 + 12/ do] . Thus Lemma [3] can be applied to this restricted set of keys and at completion, the 
mingap is at least [r/\l + 125q~|J . 

In the proofs of Theorems [l] and [2] we apply Lemma[3j When we apply the lemma, the parameter 
n that appears in the theorem will not be the same as the parameter n that appears in the lemma 
(but the meaning of the parameter m does not change.) To minimize confusion we will use to 
refer to the parameter n in the theorem being proved. 

Proof of Theorem^ In the argument below we choose C\ large enough depending on C4. 

Given N, to, r, let no = \n/2~\ and n\ = N — no- Let B be the largest integer such that noB < r. 
Let y° = {Bt : t G [l,no]}. Consider the adversary for G n (m,r) that during the first no steps 
loads Y° and then follows the optimal adversary strategy for the game G ni (TO|y°). 

For the first part of Theorem [IJ the hypothesis that r > N2 N ~ 1 implies B > 2 ni so the first 
part of Lemma p5] applies. Note that 1 — So is at least 1/2. 
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For the second part of Theorem [TJ the hypothesis r > C\m (and our freedom to choose C\) 
implies B > [Cirri/ no \ > C\/5q — 1 > 12/<5o + 1 and so part Q of Lemma [3] gives the desired lower 
bound. □ 

Proof of Theorem [£| As mentioned above, we use N to refer to the parameter n in the theorem 
being proved. We describe an adversary strategy for the game G N (m,r). It will be convenient to 
assume that m < (1 + c)N where < c < 1 is the solution of ln((l + c)/c) = 281n(2/3) which 
implies ln(l/(l — 5)) > 281n(2/3). This assumption is permitted since in the remaining case that 
(1 + c)N < m < 2n, for 5 = N/m we have that ln(l — 5), 5 and ln(l/<5) can be bounded above and 
below by positive constants, and so Theorem [2] follows from Theorem [TJ 

The adversary works in phases. The adversary initially loads a set Z$ of iVo = 1^/3] keys. 
This leaves sq = m — Nq empty spaces in the array. In phase 1, the adversary loads N\ = [so/3j 
additional items (according to a strategy described below), and sets s\ = sq — N\, which is the 
number of empty spaces in the array after phase 1. In general, after phase j — 1 the number of 
empty spaces remaining will be s,_i and in phase j the adversary loads Nj = [sj-i/3j additional 
items. We will run the process for p = \}n(l — <5)/71n(2/3)J — 1 phases. By the choice of c, p > 3. 
For each j < p, we have Sj > m(2/3)-' +1 > m(l — 5) 1 ^ > m(l — 5) > m — N , Hence, there are 
enough items to run the p phases. We will show a lower bound for the cost of these phases. If not 
all items are loaded during the phases the additional items will only increase the cost. 

Let Zj-i be the set of items loaded through the end of phase j — 1 and Zj-± = \Zj-i\. Thus 
Zj-i + Sj-i = m. Let 8j-\ = \Zj-\\/m = Yl{=o Ni/m be the density at the beginning of phase 
j. During phase j, the adversary uses the strategy for G N i(m\Zj-i) provided by the adversary in 
Lemma [3j We need to verify that the conditions of the lemma are satisfied. 

The parameter n in the lemma is Nj and we need this to be at least C4. For each j £ 



Nj > m(2/3)73 - 1 > m(l - 5) 1/7 /3 - 1 

> m 6 / 7 (m-A^) 1/7 /3- 1 > iV 5/6 , (2) 

for TV large enough. This is at least C4. The parameter <5o in the lemma is Sj-i and we need that 
this is in the interval ((ln(JV^)) — 2 , 1 — N 1 ^ 5 ). For N large enough 8j—\ > Nq/tti > 1/4 and the 
lower bound holds. For the upper bound on 8j—i, we have Sj-i < (m — s,_i)/m < (m—3Nj))/m < 

1 - 3Nj/2N < 1 - 3Nj/2N® /5 < 1 - Nj~ l/5 . Thus, <5,_i satisfies the conditions for 5 in LemmaE 
We also need that in each phase mingap(Zj_i) > 1 + 12/5,_i. Since 5j—i > 1/4 it suffices 
that mingap(Zj_i) > 49. By the remark following Lemma [3| we may assume that the value of 
mingap reduces by a factor of at most 49 in each phase, so in phase j the value of mingap is at 
least r/(N ■ 49 J_1 ) so it suffices that r > N ■ 49 p . Since 49 p < (1/(1 - 5)) C2 for an appropriate 
constant C2, the hypothesis on r in the Theorem is sufficient. 

We now show a lower bound on the cost of a single phase j. First, Nj = \_Sj— i/3j = [^(1 — 
5j_i)/3j > m(l — (5j_i)/4, for large enough. We will use the fact that for any real x £ [1/4, 1], 
ln(l/x) < 3(1 — x). From Lemma[3]we obtain the following lower bound on the cost of the phase j: 

c^Hl/Sj^W ~ l~Sj-i [ [ j)) 42 ■ C 4 (lnCl/^O) 2 
> ?Gn(JV*/«)) 2 • ' 



4 V v 4 2 -C7 4 -3 2 
> N(ln(N)) 2 /C 7 , 



for some constant C7. 
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Since the number of phases is more than ln(l/(l — <5))/141n(3/2), we obtain the required lower 
bound. □ 

It remains to prove Lemma [3j 

3 Some preliminaries for the lower bound 

3.1 Segments, time intervals, key intervals, and lazy algorithms 

We use the following terminology 

• A segment is a subinterval of the set of cells [l,m]. 

• A time interval is a subinterval of [0,n\. 

• A key interval is a subinterval of the set [l,r] of keys. If Y C [l,r] is any set of keys, a 
Y -interval is a set of the form Y fl I where / is a key interval. 

Recall that at step t, Rel 1 denotes the set of keys relocated at step t. For y £ Rel 1 the trail of 
y at step t is the segment Trail t (y) between f t ~ l {y) and for y l it is just the location 

The busy region at step t, denoted B l is the union over y G Rel 1 of Trail 1 (y). 

We say that an algorithm is lazy if B l is a segment. The following proposition says that we 
may restrict attention to lazy algorithms. 

Proposition 4 Given any algorithm A there is a lazy algorithm A' such that for any initial key 
set Y° and any key sequence y = (y , . . . ,y n ) the cost of A' on Y°,y is at most the cost of A on 
Y°,y. 

Proof: The idea is that if the busy region B 1 is a union of two or more disconnected segments, 
then any relocation outside of the segment that contains can be deferred until later. 

To make this precise, the algorithm A' keeps track of the storage function g t that would be 
produced by the algorithm A. Let /* be the array actually produced by A'. 

Initially, prior to step 1, f° = g°. At each step t, A' updates g 1 ^ 1 to g l based on algorithm A. 
It then produces /* as follows: For a segment T let K(T) be the keys stored in S under / . Let S 
be the smallest segment containing g t (y t ) (the location chosen by A to store y*) with the property 
that K(S) U {y 1 } is the same as the set of elements stored in S in g t . The algorithm then defines 
/' so that every key in K(S) is stored according to g l , and every other key is stored according to 

The definition of S ensures that the busy region B t of A' will be exactly the segment S. (Clearly 
B l C S; to see that it is equal suppose there is a location j in S that is not in B t . Without loss 
of generality j is left of / (?/*). Then if we shrink 5 by moving its left endpoint to j + 1 then the 
resulting segment contradicts the choice of S.) 

Finally, we need to show that the cost of relocations by A' is no more than the cost of relocations 
by A. For this, consider all relocations of a fixed key y. An easy induction shows that up through 
the end of any step t the number of times y was relocated by A' is less than or equal to the number 
of times y was relocated by A, with a strict inequality if /*(y) 7^ g t {y)- n 

Henceforth we assume that the algorithm is lazy, and refer to B l as the busy segment at step t. 
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4 Suitable gaps and segment table strategies 



In this section we describe the high-level structure of the adversary strategy, and state several 
parametrized properties that we will use about the strategy. 

During each step t the adversary must choose a key y t to load into the array. For a set Y of 
keys, a Y-gap is a pair yi < yR of keys belonging to Y such that no key of Y has value in the 
key interval (yLi Dr)- The gap length is yR — yh- Provided that the gap length is at least 2, there 
is always a key between yi and yR that is available to be loaded. We call such a gap suitable. A 
suitable segment is one that contains a suitable gap. Our adversary will choose a suitable segment, 
identify the largest suitable gap {vliVr) stored in the segment and select the key [(j/l + 2Ar)/2J, 
which is the midpoint of the gap rounded down to the nearest integer. The segment (resp., gap) 
chosen by the adversary at step t is referred to as the chosen segment (resp., gap) at step t. 

To head off possible confusion, we emphasize that a gap refers to the set of possible key values 
between and yR and not to the region of the array in which the keys are stored. 

The reader should think of step t as consisting of the following sequence of events. 

1. The configuration (Y* -1 ,/* -1 ) was specified prior to time step t. We refer to the associated 
configuration and density functions as the configuration at the end of step t — I or at the begin- 
ning of step t. We emphasize that the configuration at the beginning of step t is (Y* -1 , 

and not (Y*,/*). 

2. During the first part of step t, the adversary selects a suitable segment S (with respect to 
configuration (Y* -1 , We say that such a segment is suitable for step t. 

3. The adversary chooses the largest gap in S and lets y f be the rounded midpoint. Y* is set to 
be Y*- 1 U{y'}. 

4. The algorithm selects the storage function /* for Y*. 

The choice of the suitable segment at step t will depend on the configuration (Y* -1 , 
Intuitively, the adversary will select a suitable segment that is currently located in an area of the 
array that is relatively "crowded" . For this purpose we fix a real parameter A > called the weight 
parameter, and define the weight of a key y to be 1 if y G Y° and A otherwise. Given a configuration 
(Y, /), we define the following functions on segments S C [l,m]: 

• The weight w(S) = w(S, f) is the weight of all keys stored in S under /. 

• The density p(S) = p{S,f) is w(S)/\S\. The density function provides a natural measure of 
crowding of S. 

We write w t ~ 1 (S) and p t ~ 1 (S) for the weight and density of segment S with respect to the 
configuration (Y* -1 ,/* -1 ). 

We now describe how the weight parameter is chosen. The weight parameter depends on two 
things: mingap(Y°), and an auxiliary parameter 5*, called the density lower bound parameter. (This 
will turn out to be a lower bound on the density of certain segments that arise in the definition of 
the adversary). 

Weight parameter specification. Let 5* > be the auxiliary density lower bound pa- 
rameter (to be specified later). Let Y° be a set of keys with mingap(Y°) > 1 + 4/5*. 
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This choice of parameters gives the following lemma, which establishes sufficient conditions on 
a segment to contain a suitable gap. 

Lemma 5 Let S* and X be as given in the weight parameter specification. Let fiQ = mingap(y°) 
and assume /j,q > 1 + 4/(5*. Let t £ [l,n] be any step and let S be any segment whose weight (with 
respect to w t ) is at least 2 and whose density is at least 5* . Then S is suitable for step t. 

Proof: First consider the case that > 2 n , and so A = 1. An easy induction shows that the 
minimum gap in Y is at least 2 n_i , and so after step t — 1 the minimum gap is at least 2, so every 
gap is suitable. Since S has weight at least 2 it contains at least one gap. 

Next consider the case that /jlq < 2 n . Let A be the set of keys from Y° stored in S after step 
t — 1 and B be the set of other keys stored in S. Let a = \A\ and b = \B\. We first claim that 
a > 2. The weight of S is a + bX = \S\p(S) > (a + b)5* which implies a > b(5* - A) = bX, which 
implies a > (a + bX)/2 = w t ~ 1 (S)/2 which is at least 1 by hypothesis. Thus a > 2. 

Let min^ and max^ be the smallest and largest keys in A. Suppose for contradiction that there 
is no suitable gap between min^ and max^. Then all of the max^ — min^+1 keys in the range 
[min^, max^] must have been loaded already. There are a — 1 gaps between keys of A, each of size 
at least [Ao so we must have b > (a — l)(/zo — 1) > (a/2)4/<5* = a/A, which contradicts a > bX. □ 

The adversary we describe will identify a segment satisfying the conditions of Lemma [5] (and 
other conditions as well.) The strategy is based on a structure called a segment table. A segment 
table is an array with n columns (one for each step) whose entries are array segments. The entries 
of the table are colored green or red. The rows of the segment table are called levels and the number 
of levels, which we normally denote by d, is the depth. The level index increases from the top to 
the bottom of the array. A level-step pair (i,t) G [l,d] x [l,n] is called a site, and the table entry 
(segment) at site (i,t) is denoted Sj. It is sometimes convenient to consider (0, t) for t G [l,n] to 
be a site even though there is no corresponding table entry. The segment table must satisfy: 

• The segments down each column are nested: S\ D • • • D S\. 

• The color of an entry in column i is defined as follows: S\ is green if B t C S\ (recall that B t 
is the busy segment at step t) and is red otherwise. Together with the nesting property this 
implies that each column consists of a (possibly empty) sequence of green entries followed by 
a (possibly empty) sequence of red entries. 

• If segment Sj is green then S* +1 = Sj. 

For each level i, we partition [l,n] into intervals called i-epochs, where each red site at level i 
marks the end of an i-epoch. The last i-epoch (which ends at step n) is called the terminal i-epoch 
and the others are non-terminal. 

An i-epoch E is identified with the set {i} x E of sites. All sites but the last site in the epoch 
are green. For a non-terminal epoch the last site is red, for a terminal epoch the last site may be 
red or green. 

For an i-epoch E, the left endpoint of E is called the starting time and is denoted s(E) and the 
right endpoint is called the closing time and is denoted c(E). 

By the properties of the segment table, every site of E is associated to the same segment, which 

is denoted Sf. Since for each column every site above a green site is green, epoch E is a subset 

of an epoch at level i — 1, so all sites in{i — 1} x £ are associated to the same segment, denoted 
cE 
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Figure 1: Segment table. 



For a site (i, t) we write E(i, t) for the i-epoch containing t and s(i, t) and c(i, t) for the starting 
time and closing time of that epoch. 

We will specify an adversary that builds the segment table by constructing column t during 
step t. For our adversary, the events of step t described earlier can be refined as follows: 

• The configuration (Y* -1 , f 1 ^ 1 ) was specified prior to time step t. 

• During the beginning portion of time step t, the adversary chooses the segments for column t. 
This selection is based on the configuration (Y* , and functions derived from it such as 
p' _1 . The segments will be chosen in such a way that satisfies the hypothesis of Lemma [5] 
(with respect to (Y <_1 , and therefore contains a suitable gap. 

• The adversary chooses the largest gap in and lets y t be the approximate midpoint. Y* is 
set to be Y*" 1 U {y 1 }. 

• The algorithm selects the storage function /* for Y*. 

• The choice of /* together with the previous storage function determines the busy segment 
BK 

• Each segment S\ in column t is colored green (if S\ contains B l ) or red (if Sj does not contain 
B f ). 

A procedure for the adversary to choose column t given (Y t_1 , completely determines an 
adversary strategy. We call such a strategy a segment table strategy. 

We will specify and analyze a particular segment table strategy. We begin by identifying some 
additional properties we want our table to satisfy, and then show how these properties lead to a 
lower bound on the cost incurred by the algorithm. 
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We use the segment table to help account for the cost of the relocations done by the algorithm. 
We partition Rel 1 (the set of keys relocated at step t) into subsets Qq, . . . , Q d as follows: For i > 1, 
Q\ is the set of y € i?eZ* such that f t ~ 1 {y) £ £■ — <5* +1 (i.e. <5| is the smallest segment in column t 
that contains the location that y was moved from). We include y* in Q d . Q\ is the set of those y 
that were moved from a location outside of S\. Let q* = \Q\\- For an z-epoch i£ at level i we define 
qf = YlteE wn i cn is the total cost of relocations associated to E. Thus the cost incurred by the 
algorithm is Yli=o if' where the inner sum is over all i-epochs. 

We are now ready to state the desired properties of the segment table. These properties depend 
on three strategy parameters: 5* (the density lower bound parameter introduced earlier, which 
determines the weight parameter A), 7 and a. These parameters will be chosen later. 

(PI) The number of levels d is an integer greater than or equal to 8. 

(P2) All segments have size at most n/2. 

(P3) For each t and i > 2, \Sj\ < \Sj_ 1 \/2. (Segment sizes decrease by at least a constant factor 
down columns) 

(P4) All segments have size at least I/7. 

(P5) pt'HSl) > 5 e~ a and for i > 2, p^ l {S\) > e~ a p*" 1 . (Segment densities do not 
decrease much down columns). 

(P6) Every segment in the table has density at least 8* 

(P7) For any non-terminal i-epoch E with starting time s, qf > ^w s ~ 1 (S E ), that is, the relocation 
cost associated with epoch E is at least a 1/8- fraction of the weight of the associated segment 
S E at the start of the epoch. 

One of the constraints we will impose on the parameters (condition (II) below) is 5* > 27. 
With this constraint, (P4) and (P6) together with Lemma [5] guarantee that the segments S d are 
suitable. 

We will prove two lemmas. The first lemma (Lemma [6]) gives a lower bound on the cost incurred 
by the algorithm against a segment table strategy that satisfies the above properties, in terms of 



the parameters 7, a and 5* in the properties. The second (Lemma 11 ) shows that there is a segment 
table strategy that satisfies the above properties with suitable values of the parameters. Finally, in 
section 6.4 we use these two lemmas to prove Lemma [3} 



5 A segment table strategy gives a good lower bound 

In this section we prove a lemma that proves a lower bound on the cost incurred by an algorithm 
based on a segment table strategy. The lemma encapsulates and extends the main accounting 
argument of Dietz et al. [DSZ05a, Zha93], which they used to prove an f2(ln 2 (n)) amortized lower 
bound for the special case of smooth algorithms. 

Lemma 6 Let m, n, no, 6q and Y° be as in Lemma^ Let a, 5*, 7 be positive parameters and let A 
be the associated weight parameter as defined earlier. Lf a segment table strategy produces a segment 
table with d levels satisfying (P1)-(P7) then the cost incurred by the algorithm satisfies 
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Remark. When we choose values for these parameters, in the case that 5q is a constant in 
(0, 1), 5* and A will be bounded below by positive constants, and the denominator will be bounded 
above by a positive constant. Also d will be 0(ln(n)), yielding an f2(nln(n) 2 ) bound. 
Proof: Let % denote the cost of a given algorithm against a segment table-based strategy as in the 
lemma. As noted above x > Yli=o YIeiF- Let E + denote the set consisting of the largest [|.E|/2] 
time steps belonging to E. We will spread the cost of epoch E among the sites corresponding to 
E + . It is easy to check that for each t G E + , \E + \ < (t — s(E) + 1) and thus: 

qf > £ qf/(t-s(E) + l). 
teE+ 

Say that a site (i, t) is chargeable if (i) i £ [2,d— 1], (ii) the i-epoch E(i,t) containing t is a 
non-terminal epoch (not the final epoch at level i), and (iii) t £ E(i,t) + . Let CS denote the set 
of chargeable sites. From (P7) and (P6) we have that for an i-epoch E, qf > ^w s ^~ 1 (S E ) > 
5*\S E \/8, and so: 

5* x 



,, lfcf . fl (*-(M)+i)/isjr 

where s(i,t) is the starting time of the z-epoch containing t. 

We use following standard fact (the arithmetic-harmonic mean inequality): 

Proposition 7 For a±, a,2, ■ ■ ■ , a p , k > 0. 

p 1 



p2 



i=l 



The length of every epoch E is at most | S E \ (since at most | S E \ keys can be stored in S E before 
some keys are moved outside of S E ) and so (P2) implies that all epochs (including terminal epochs) 
are length at most n/2. Thus at each level the union of the non-terminal epochs has size at least n/2 
and thus the number of chargeable sites at a given level i is at least n/4. Thus \CS\ > n(d — 2)/4. 
Applying the proposition together with (PI) gives 

5*n 2 (d- 2) 2 5*n 2 d 2 
X ~ l28J2 iht)eCS (t-s(z,t) + l)/\Sj\ ~ 25Q^ t)eCS (t-s(i,t) + l)/\Sj\ 1 > 

It remains to upper bound the sum in the denominator. Since for every (i, t) the term in the 
denominator is nonnegative, it suffices to bound the extended sum where i € [2, d— 1] and t € [1, n]. 
For each fixed t G [1, n], we can bound the terms of this sum corresponding to t by: 

d-l d-l X 

i=2 i=2 1 *' 

We bound the first sum using (P3) and (P4): 

j=2 1 1 rfl j>0 

To bound the second sum we claim that for any i-epoch H containing t and any time u satisfying 
s(H) <u<t: 
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t ] ^ = \(p t -Hs H )-P u -Hs H )). (7) 

This follows since all sites in {i} x (H — {c(H)}) are green, so the set of keys stored in S H 
after step t — 1 consists of those stored after step u — 1 together with those loaded during steps 
u, . . . , t — 1 and therefore the density increases by exactly X(t — u)/\S \. 

Fix i £ [2,d-l]. Let E = E(i,t) and F = E(i-l,t). Applying Q with H = E and u = s(i,t) 
gives an upper bound on the summand of the second sum in ^ as a sum of differences. It is hard 
to analyze the sum of these differences directly, so we do a little more manipulation. Since E C F, 
we can apply (P3) together with ([7]) with H = F and u = s(i, t) to get: 

nlr 1 - 2 n^r = l {pt ~ 1{sF) - (8) 

Subtracting Q from twice Q (with H = E and u = s(i,t)), and rearranging terms gives: 

< ^[(P*- 1 ^) " P*- 1 ^)) + (P S ^-HS F ) - P S ^-\S E ))]. (9) 
When we sum this over i = 2 to d — 1 the first part of the sum telescopes to |(p*- 1 (5 B ( d - 1 '*)) - 

p t-l( S E(l,t) ))_ By (p 5 ) 5 thig ig at mogt 2^ _ g -a Jo ) < 2(j _ 5o + 

For the second part we have by (P5) (since E starts at time s(i,t)): 

liP^-HsU - p^'Hsf)) <- x (i- e~-) P ^-\s^) < ^. 

Summing over i G [2, d — 1] gives at most ja(d — 2). 

Combining the three parts of the sum of the denominators we get an upper bound of 
l) + j\ + l-5 ) < ^(ad + j\ + l-5 ). This yields the desired lower bound x > 5* Xnd 2 / 500 (ad + 
7A + l-5 ). □ 



6 Construction of a good segment table adversary 

In this section we give a construction for a segment table adversary. The construction takes two 
parameters: the number of levels d and an auxiliary potential function parameter k > 0. In 



Lemma 11 we prove that for a particular choice of parameters d and k the adversary is well-defined 
and satisfies the properties (P1)-(P7) for specific choices of a, 7, 5*. 

To specify the strategy we need a rule which given a step t and the first t — 1 columns of the table 
(including the colors) selects the segments for column t. To define the rule, it will be convenient to 
augment the segment-table by defining an auxiliary segment T\ at each site (i,t), which we view 
as sharing the site (i, t) with Sj in the segment-table. The segments T* will satisfy: 

T{ D S\ D T\ D Si D • • • D T* D 5^ 

Our adversary will select the T* and 5* in the order above. To describe how this is done, we 
need some additional definitions and observations. 

• The segment W t is defined as follows: Divide [l,m] into segments from left to right 
Ai, A2, ■ ■ ■ , A r where each Ai has size |~n/2] except the last At, whose size is between [n/2] 
and 2[n/2] < n + 1. Take W t to be that segment Ai that maximizes p t ~ 1 (Ai) (breaking ties 
arbitrarily). Observe that p t ~ l (W t ) > p* _1 ([l,m]) > 5q- 
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• For a segment T, middle (T) is denned as follows: Break T into three segments from left to 
right, L,M,R where \L\ = \R\ = U?1/3j. middle(T) is the segment M. 

Let S be a segment and be the density function at the end of step t — 1. Let k > 0. 

• S is K-upper balanced (with respect to p 1 ^ 1 ) if every subsegment of size at least |5|/4 has 
density at most p t ~ 1 (S)A K . 

• S is K-lower balanced if every subsegment of size at least 151/4 has density at least 

p t-\s)(i/Ay. 

• Define the K-potential of segment S to be (j) t ~ 1 (S) = |S'|p* -1 (<S') 1 / K . 

• For a segment T, densify^T 1 (T) is the subsegment D that maximizes (f) t ~ 1 (D) (breaking ties 
arbitrarily) . 

We note the following easy facts. 

Proposition 8 Let T be an arbitrary segment and t G [l,n]. 

1. The size of \T\ is at least its potential (since p' _1 (T) < 1). 

2. p l - x (density^ X (T)) > p'" 1 ^)- 

3. If T is not K-upper balanced (with respect to p t ~ 1 ) and D is a subsegment of T that violates 
the conditions of K-upper balance then (t) l ~ l {D) > </>* _1 (T). Thus, since density^ 1 (T) has 
no subsegment with larger fyr 1 , it must be K-upper balanced. 

Finally, we define balance^. 1 (T) to be the subsegment middle(densifyJr 1 (T)) of T. The 
properties of balance^. (T) that we need are given by the following lemma. 

Lemma 9 Assume k < l/241n(4). Fix a step t E [1,^]- Let T be a segment, let D = 
density* -1 (T) and S = balance^ X (T) = middle(D). Assume \S\ > 4. Then 

• P^HS) > e- 241n ( 4 ) K / 9*- 1 (Z?) > e^ 241n ( 4 )V _1 (r). 

• S is 25K-lower balanced with respect to p t ~ 1 . 

• > 4- 1 (r)e-( 24kl ( 4 )+ lll ( 3 )). 



We give the proof in section 6.1 



We are now ready to describe the adversary strategy for selecting segments in column t at step 
t. For t > 2, this selection will depend on the segments and coloring of the previous column. Recall 
that the red-green coloring of the previous column t — 1 is determined by the action of the algorithm 
in response to y* -1 , and that B l ~ l is the busy segment at step t—1, which is the minimal segment 
of the array in which all rearrangements occurred. The adversary depends on the parameter d (the 
number of levels) and the potential function parameter k > 0. 

Adversary(d, k) 

• Preservation rule: If t > 2 then for % = l,...,d, if (i,t — 1) is green then Tj = T\~ l and 
Sj = Sj~ . (Copy the corresponding segments from the previous column.) 

• Let j* be the first level i to which the preservation rule does not apply. This is the t- critical 
level. 
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• (Rebuilding Rule) 
For i = j t , . . . ,d: 

— Determine T\ 

* If i is the critical level then: 

• If * = 1 then T\ = W l . 

■ If i > 1 then If = T l r x U B t_1 . 

* If i is not the critical level then if = middle(S'|_ 1 ). 

- Determine S\: S\ = balance* -1 ^*). 

Remark. In the case that i is the critical level and i > 1, if is defined to be the union of two 
segments. This union is required to be a segment, and for this we need Tf~ n B 1 ' 1 ^ 0. But this 
is clear since includes the locations where the selected gap , y^T was stored, and those 

locations are in T* — . 

In section |6.1| we prove Lemma [9j In section |6.3| we state and prove a lemma that shows that 
Adversary(ci, k) satisfies the desired properties (P1)-(P7) for particular choices of a, 7, 5*. 

6.1 Proof of Lemma |9] 

We now turn to verifying the needed properties of the function balance^.. 

Throughout this section, we fix t. We omit the superscript t — 1 from balance, densify, p,(j> 
and w. Also, we fix k and omit the subscript k from balance and densify. 

As noted earlier, D = densify(T) is K-upper balanced. 

Claim 10 Let U be a subsegment of S of size at least |5|/4. Then p{U) > (l/4) 24K p(D). 

Given the claim, we deduce the lemma. For the first part, we apply the claim with S = U 
and note that p(D) > p(T). For the second part we combine the claim with the fact that p(D) > 
p(5)(l/4) K (which holds since \S\ > \D\/3 and D is K-upper balanced.) For the third part of the 
lemma, we note that 4>(S) / 4>(T) > <f>(S)/ <f>(D) > \{p(S)/ p^D)) 1 /* , and apply the first inequality of 
the first part. 

So it suffices to prove the claim. The set D — U consists of 2 segments L (on the left) and R 
(on the right). The weight of D can be written as: 

\D\p(D) = \L\p{L) + \R\p(R) + \U\p{U), 

which implies: 



Since \S\ > 4 and S 
middle(D), we have \L\ 



p(U) = ^{\D\p{D) -\L\p(L) -\R\p{R)). 

= middle(D) it follows that \D\ > 8 and \S\ < \D\/2. Since f/C5 = 
\R\ > l-Dl/4 and since D is K-upper balanced, p(L),p(R) < 4 K p(D). So: 
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p{U) > ^1(\D\-(\L\ + \R\)4«) 
> E@l(\D\4-*-\L\-\R\) 
P ^ f \D\4T K -\D\ + \U\) 



\U\ 

p{D \\U\-\D\(l-e-^ K )) 



\U\ 

> p(D)(l-|^iln(4)«)) 

> p{D){\ - 121n(4)/«) > p(D)A- 24K , 



where the final inequality uses the hypothesis that k < 1/ (24 ln(4)) and the inequality (1— x) > e 2x 
for x < 1/2. 

6.2 Setting the parameters 

The adversary strategy depends on parameter k and d. The properties we need involve parameters 

We will need the following constants 

• C5, needed in Lemma 12. 

• Co , which is a lower bound on n (chosen after C5 is chosen) imposed to make various conditions 
hold. 

We impose the following hypotheses on n and <5o- 

• (Al) n > C 

• (A2) 6 G (ln^-^l-n- 1 / 5 ). 
We set the parameters as follows: 

« - ^1 (10) 

„ = 2C5K= 4 ^ 2 ) (12) 

m(n) 

7 = n" 1/4 (13) 
<T = (Joe 150 - 1 . (14) 

Lemma 11 Let m, n, no, Y°, So, Po be as in Lemma\^ Let the parameters be set according to 
|70p-(T^). Let A be the weight parameter as previously specified. Assume (Al) and (A2). Then 



Adversary(d, k) satisfies (P1)-(P7). 
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As one would expect, the choice of parameters is dictated by two considerations. During the 
proof of Lemma 11 various inequalities involving the parameters will be needed. The parame- 
ters must satisfy these. Subject to these inequalities, we seek to (approximately) maximize the 
expression in the lower bound of Lemma [6} 

' ' 1 we 



To isolate some of the technical computations, before presenting the proof of Lemma 11 



collect together the inequalities involving the parameters that are needed in the proof. We then 
explain how the parameters were chosen to satisfy these inequalities and (approximately) maximize 
the lower bound of Lemma [H 

(II) S* fry > 2. (This will ensure that S l d satisfies the hypotheses of Lemma[5]and thus has suitable 
gap.) 



(12) k > 21n(l/<5o)/ln(n). This is needed to prove the lower bound (16) on the cjr -value of the 



set Tl 



(13) d < 2^7ln(7\/n). Together with (12) this is used to prove (25) and (23) which give lower 
bounds on the c/>' -1 -value of all segments in column t, which directly implies (P4). 

(14) 7 < 1 /4. With (P4) this implies that every segment Sj has size at least 4. This is a hypothesis 
of Lemma |i~2"t where it is used in order to apply Lemma [9l 



(15) dn < ijfj- \n.(8o/S*). This inequality is used to prove (P6), via (24). 

(16) k < 24 t ^ 4 j (for Lemma 9) and k < 1/50 (for the third claim in the proof of (P7)). 



(17) a > 2C 5 k. This is used to prove (P5) via ^18]) and ([19]). 

The choice of parameters was made based on these inequalities as follows: 

1. We will choose parameters so that the denominator of Q is 0(1 — do). For this we will need 
that ad = 0(1 - S ) and 7A = 0(1 - 6 ). 

2. The parameter 7 is involved in inequalities (II), (13), and (14). These inequalities leave a lot 
of room, and we choose 7 = n Then (14) holds, (II) becomes 5* > 2n" 1 / 4 , (13) becomes 
d < ln(n)/8C5. Also, by the restriction (A2) on So we have 7 = 0(1 — So) (as needed for the 
denominator of Ml) 



3. We want d to be large so we want a to be small. So we set (17) to equality to determine a 
(as a function of k). 

4. We choose k as small as possible by making (12) an equality. Since n is sufficiently large, (16) 
holds. 



5. Having chosen a and k, to have ad = 0(1 — So), we want d = 0(1 — So) /a. We also need 

l-Sp 
-8C e ln(l/5o) 



(13). We can satisfy both by taking d = LsfvTKiT/Tl m ( n )Jj noting that 1 — So < ln(l/#o) for 



all So € (0,1). 

6. In order to make (15) hold we need 5* small enough but for the lower bound we want S* large. 
So we choose S* as large as possible subject to (15). Note that for this choice (II) holds since 
I/7 > e/So which is at most eln(n) 2 by (A2). 

We now proceed to the proof of Lemma |TT} In the proof we refer only to the inequalities 
(II)- (17) and not to the actual values of the parameters, 
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6.3 The adversary satisfies the required properties 

In this section we prove that Adversary(d, k) satisfies (P1)-(P7) for a suitable choice of parameters. 
First we prove a lemma that relates the p and <p values of the segments in the segment table. 

Lemma 12 There are positive constants C§ and Cq such that the following holds for 
Adversary(d, k) provided that n > Co and (I1)-(I7) hold. Suppose that for each t 6 [l,n] and 
i G we have \S*\ > 4. For each t £ [l,n] 



p'-HTt) > do (15) 



^-\T[) > ^. (16) 
and for all i > 2 : 

if t starts an i-epoch then S\ is 25n-balanced with respect to p l ~ l . (17) 





> 


p'-'iji: 


e" 






p^in+t) 


> 


p'-Hsj) 


e 


-Cbk 


ifi<d-l 


t^ 1 (st) 


> 




)e 


-C b 






> 




)e 


-C 5 


ifi<d-\ 



(18) 
(19) 
(20) 
(21) 

Proof: We prove these statements by induction on t, and for fixed t by induction on i. 
We will repeatedly use the following easy fact: 

Proposition 13 Let S C S' be segments and s < t be steps. Suppose that for all steps r 6 [s,t — 1], 
the busy segment B r is a subset of S. Then p r {S), 4> r (S), p r (S)/p r (S') and (jf (S) / (ff (S 1 ) are all 
nondecreasing as a function of r € [s,t — 1) 
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both w r (S) and w r (S') increase by A and 

□ 



Proof: This follows from: At each step in [s, t 
w r (S) < w r (S'). 

Proof of ( |15[ ) and (16). Suppose first that t is the starting time of the 1-epoch containing t. 
W*, andp*- 1 ^!* 



Then T[ 



n/2)5Q K and by (12) this is at least 



> 5 . We have t-1 (3?) > 
For t not the starting time of the epoch we use Proposition [13} 

For the proofs of the remaining parts, we will need to apply Lemma [9] with T 



n/2. 



Tf and S = St 



The hypotheses of Lemma [9] follows from (16) and the hypothesis \Sl\ > 4 of the present lemma. 
Proof of (17). This follows immediately from the second part of Lemma [9| 
Proof of (18) and ( |20[ ). In the case that t starts an z-epoch these follow immediately from the 
first and third parts of Lemma ^1 provided that we choose C5 > 241n(4) + ln(3). For a step that 



does not start an i-epoch, this follows from Proposition 13 



Proof of (19) and (21). Let E be the i-epoch containing t. In the case that t starts an i-epoch 



this follows from the second part of Lemma [9j If t does not start an i-epoch, let s be the starting 
time of the epoch. We cannot apply Proposition 13 directly because, while the segment S\ = Sf, 
it may not be true that = Tf +1 because there may have been one or more new i + 1-epochs 
started. However, at each of these new i + 1-epochs, i + 1 was the critical level (since the i-epoch 
did not end) which means that the set T* +1 is equal to the union of Tf +1 and all of the busy 
segments B r for s < r < t — 1. This is a subsegment of S\ = Sf that contains Tf +1 . Hence we can 



19 



apply the second part of Lemma [9] to get that at the beginning of the epoch /9 S_1 (T/ +1 ) is at least 
e~ C5K p s ~ l (Sf ), provided that C5 > 251n(4). Now we can apply Proposition |l3| to show that the 
same inequality holds for (keeping in mind that S\ = Sf). Since |T* +1 |/|5f| > 1/3 we also get 



(21), provided that C 5 > 251n(4) + ln(3). 



□ 



Proof of Lemma 1 1 



Using Lemma 12 repeatedly we have by induction on i = 1, . . . , d for fixed t £ [1, n], that: 



> 



e (2-2i)C 5 > 1/7 



P^iSl) > 6oe {1 - 2i)C5K > 5* 



(22) 
(23) 
(24) 
(25) 



The final inequality of (24) follows from (15). The final inequalities of (23) and (25) follow from 
(13). Note that the final inequality of (25) and (14) imply that as we proceed to level i in the 

holds at each step. 
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induction, the hypothesis \S^\ > 4 of Lemma 
Proof of Property (PI). By definition d is an an integer; we need to verify that d > 8. Consider 
the definition of d given in (11). If So > 1/2, one readily verifies that (1 — <5o)/ln(l/<5o) > 1/2, and 
so d = 0(log(n)). 

If So < 1/2 then 1 — So > 1/2 and ln(l/^o) < 21n(ln(n)) by assumption (A2), and again for n 
sufficiently large d > 8. □ 

Proof of Property (P2). T* is always equal to one of the sets W u (for some u <t), which has size 
at most Ti + l. Since 5* is middle(D) for a subsegment of if, \S^\ < n/2. □ 

Proof of Property (P3). We have | -S'* ^ | > |T/| > 2|Sf|, since SI- is contained in the middle of a 



We have |5j| 



> 



-H&a) > 1/7, by (25). □ 



This follows immediately by combining (18) and (19) and (17). □ 



This follows from (24). □ 



subsegment of T\. □ 
Proof of Property (P4)- 
Proof of Property (P5). 
Proof of Property (P6). 

Proof of Property (PI). Let E be an epoch at level i. Here we are trying to lower bound qf, 
which is the cost of all relocations done during epoch E. Let s denote the start time, and c denote 
the closing time, of epoch E. The busy segment B c includes a location outside of Sf (this is the 
reason that the epoch closed at time c.) Without loss of generality let us say that B c includes a 
location that is to the left of Sf. Let L be the left segment of S, 
is an immediate consequence of the following four claims. 



T? +1 . The desired lower bound 



1. For each time r G E, T[ +1 is a segment contained in S E , T[ +1 = T/ +1 U B\ U • • • U B r ^\, and 

« r r7- :1 ^0. 

2. Every key stored in L at the start time s of E must move sometime during E. 

3. The first step t that a key y stored in L was moved during E we have y € Q\. Thus qf is at 
least the number of keys that were stored in L at step s. 

4. The number of keys stored in L at step s is at least \S- \p s ~ 1 (S- )/8. 

For the first claim, it was noted in the remark after the description of the adversary that T[ +1 

T? +1 U B s U • • • U -B r -i, comes from the 



is a segment that intersects B? +l . The fact that T[ +1 
definition of the adversary. 
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For the second claim, suppose for contradiction that y is a key that is stored at location j 6 L 
at the end of time s — 1 and does not move from j throughout the epoch E. Then j Tf +1 and 
j B r for every r 6 E. Then j U B s U • • • U B c , which by the first claim, is a segment. This 
implies that B c contains no element to the left of j, contradicting that B c contains an element to 
the left of L. 

For the third claim, consider the first step t that y was moved from location j. So j B r for 
any r £ [s, t), so is not in Tf +1 = T? +1 U B s U ■ ■ • U -B r -i- Hence j £ 5* — S\ +l . Thus the relocation 
of y is charged to level i at step i. 

For the fourth claim, L is a subsegment of of size at least |Sf|/4. Since S\ is 25K-lower 



balanced by (17), p(L) > p(S*)(l/4) 25K > p(Sj)/2, by (16). □ 



This completes the proof of Lemma [TTJ □ 



6.4 Proof of Lemma I 



The hypotheses of Lemma [3] give us that n is sufficiently large, 5q < (ln(ra)~ 2 ,l — n~ 1//5 ), and 
mingap(y°) > 1 + 12/<5o- Then by Lemma [TTj Adversary (d, k) satisfies (P1)-(P7) for d, k, a, 7, 5* 
given by (flO|)-([T4|) . 

We apply Lemma [6| with these parameters. The denominator of Q is Q(ad + 7A + 1 — 5o). 
The settings given by (12) and (11) give ad < (1 — 5q)/2. The setting 7 = n -1 / 4 and A < 1 and 
assumption (A2) give 7A < 1 — So. So the denominator of Q is 0(1 — 5o) 

For the numerator, the setting of S* gives 5* > 5o/e, the setting of d gives d 2 = 0((ln(n)) 2 (l — 
<5o) 2 /(ln(l/<5o)) 2 . Simplifying the fraction gives: 



X >G(nln(n^ A,5o(1 



(Mi/So)) 

as required. 

In the case that mingap(y°) > 2", A = 1 and in the other case A = 0(<5q)- 



7 An upper bound for inserting a small number of items 

In this section, we show an interesting upperbound on x n ( m ) f° r the case that n is a polylogarithmic 
function of m. 

Theorem 14 Let m > 2 16 and k be an integer such that k < 1 / 2 A/log m j log log m . Assume 
n < log(m) fe / 3 . T/ien x n ( m ) ^ (2fe — 1)^ ; *-e- ; i/iere is an algorithm that loads n keys into an array 
of size m with amortized cost of 2k — 1 per key. 

Proof: We proceed by induction on fc. To simplify the description we assume (without loss of 
generality) cells 1 and m are initially loaded with keys y m i Q and y max which are, respectively, lower 
and upper bounds on all keys. Set Y° = {y m in, Vmax} • 

At any time the array has certain occupied cells. A segment of cells whose leftmost and rightmost 
cells are occupied and all others are unoccupied is called an open segment; the keys in the leftmost 
and rightmost cells of the open segment S are denoted yL^S) and yn(S) (we include the occupied 
end cells in the open segment for convenience in some calculations). The initial open segment has 
size m. The segment is said to be usable if \S\ > 3 (which means there is at least one unoccupied 
cell). For any new key y not stored in the array there is a unique open segment S such that 
yi{S) < y < yn(S); we say that S is compatible with y. If key y is assigned to an unoccupied cell 
in S then the open segment S is split into two open segments which overlap at the cell containing 
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y; the sum of the sizes of these two segments is \S\ + 1. A middle cell of S is a cell such that the 
two segments obtained from S each have size at least |<5|/2. It's easy to check that every usable 
segment has a middle cell. More generally, it can be checked that given q — 1 items to be placed 
in an open interval S that has at least q — 1 unoccupied spaces we can place them evenly so that 
each of the q open segments produced has size at least |<5|/g. (The worst case is \S\ = aq + 1) for 
some integer a, and in this case each of the q resulting subsegments has length a + 1 > \S\/q.) 

We will define algorithms Ak for k > 1. It will be obvious from the definitions that the cost per 
item loaded is at most 2k — 1. The main technical question will be how many items Aj, can handle. 
Let us define nfc(m) to be the maximum number of items that A^ can handle in an interval of size 
at least m (the argument m need not be an integer). Our goal is to show that n^m) > |_log(m) fe / 3 J . 

We now define algorithm A\ for k = 1. For each successive key y 1 (t > 1), we identify the open 
segment S compatible with y 1 . If it is usable we store y l in the middle cell of the segment. 

Let us analyze this algorithm. We never move any loaded key, so the cost per loaded key is 
1. We want to lower bound the number of items that can be loaded. The size of the initial open 
segment is m, so after loading t—1 items every open segment has size at least m/2 t ~ 1 and we can 
handle y l if this is at least 3. Thus we can load t items provided that t — 1 < log 2 (m/3) + 1. So 
ni(m) > log 2 (m/3) + 1, which is at least log 2 (m) 1//3 for m large enough. 

For k > 2 we define A^, which makes use of Ak-i- We initially load q = Llog(m)( fc ~ 1 ^ 3 J items 
using algorithm A^_i, which by induction can be done at amortized cost of 2k — 3 moves per item. 
We then move all of the items so that they are as evenly spaced as possible along the array which 
increases the amortized cost per item to 2k — 2. Each of the resulting open segments has size at 
least m/(q + 1). Let us define Sj for j > to be 2~ 3 m/(q + 1). 

Next the algorithm works in rounds. Let old^ denote the set of keys loaded prior to round R. 
The algorithm will ensure that the open segments defined by the locations of old^ at the beginning 
of the phase all have size at least sr-\. We have already seen that this holds when R = 1. Round 
R will consist of two phases. During the first phase we will load nk-i{sR-\) new items without 
moving any items in old^. During the second phase we move items (both old and new) to ensure 
that all open segments have size at least sr = sr-\/2. 

During the first phase we refer to the open segments defined by the storage function at the 
beginning of the phase as working segments. We will run A^_\ independently on each working 
segment. When a key arrives we assign it to the working segment it is compatible with, and load it 
into the working segment using A^_\. Since each working segment has length at least s^" 1 and we 
load at most nk-i(sR-\) keys in all of the segments we are guaranteed that each of the independent 
copies of Ak~i successfully load all of their assigned keys at amortized cost of 2k — 3. 

For the second phase we need to rearrange the elements to guarantee that the lower bound on 
working segment length decreases by at most a factor of 2. Classify keys as old or new depending 
on whether they were added in round R. Say that a segment S is useful if its first and last cells 
contain old keys (useful segments are unions of one or more consecutive working segments), and 
define the excess of a useful segment to the number of old keys in it minus the number of new keys. 
We need to identify a collection of disjoint segments each having excess exactly one that cover all 
of the new keys. To see that such a collection exists, first note that the entire array is a useful 
segment with positive excess. Now choose a collection S of disjoint useful segments that together 
cover all new keys, such that S is as large as possible, and subject to this, the sum of the sizes of 
segments in S is as small as possible. We claim that each S € S has excess exactly one. Consider 
such an S and assume that its excess m is greater than 1. Let jo, ji, ...,jtbe the index of the cells 
of S that store old elements. If there are no new elements between jo and ji then we can shrink 
S to start at j\ contradicting the minimality of YIt^s I^T Thus the segment [jo, ji] has excess 
at most 1. Let j r be the largest index such that [jo, jr] has excess at most 1. Then [jo, j r +i] has 
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excess greater than 1, which implies that [jo, jV] has excess exactly 1 and there are no new keys in 
[j r , j r +i\- But then we can split S into [70, jV] an d [j r +i,jt] each of which has positive excess, and 
this contradicts the maximality of S. So S has the desired properties. 

Now within each segment S € S we redistribute the keys (both old and new) that are internal 
to S uniformly within S. If S had u—1 internal old keys then it was originally split into u working 
segments each of size at least sr-\. We now have 2u — 1 internal keys which will split S into 2u 
working segments (that overlap at their endpoints) in the next round and after redistributing them 
all of them will have size at least sr-\/2 = sr. 

The total work done to accomplish phase 2 is 2u — 1 which is less than twice the number of new 
keys in the segment. Summing over all segments in S gives an additional amortized cost of 2 per 
key. Thus the total amortized cost of is at most 2k — 1 per item. 

It remains to bound the number of items that can be handled by A^. 

Let r denotes the number of rounds. We have s\ = m/q > log ( m )(fc_ 1)/3 . Then the number of 
items inserted during all rounds is 



E>^(».)"- 1>/3 >E'^S)' 



- Sl \(fc-l)/3 
iOg 2 [ ' 

=1 i=l 



^(log 2 si - i)^" 1 )/ 3 > r(log 2 si - r)^- 1 )/ 3 



i=i 



Let us chose r = \J\ogm and we obtain 

31og 2 (m)(*-D/3 " ^°^J 

flog 2 (m)( 2fc+1 )/ 6N ) (l - ^ l0g27W - k - n °^QE2m \ 
V J \ log 2 m 3 log 2 m J 

Using the fact that k < l/2y / log 2 m/log 2 log 2 m we obtain the lower bound for this expression 



(fc-l)/3 



1 / log 
6'V log 2 l 



which is for m > 2 16 greater than log 2 (m) 2fc ' /6 . Therefore during all rounds of A^, log 2 (m) fc / 3 keys 
are loaded with an amortized cost 2k — 1 per insertion. 

□ 
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